NTpred framework for TYROSINE NITRATION inference On Unseen Data :

Users can feed raw sequences in the text field, or upload a Fasta file containing raw sequences.

  • Input sequence in text field should be comma separated raw sequences.
  • Input Fasta file should contain: Sequence name, Sequence
  • The framework predictions are saved in CSV file that can be downloaded.
  • Furthermore, our framework predicts the Tyrosine nitration sites at multiple positions in an input sequence, based on the position of y in the sequence.
    • Lets us assume a sample input sequence in the Fasta file:
      > sample_1
      ITILSYHSSIGVRKDELVHGYILVYSAKRKASMGMLRAFLS
    • In this sample, Y occurs at locations 6, 21 and 25.
    • Then, the output prediction CSV file will have 3 rows for this sequence:
      • Sequence_name, Sequence, Probability, Class
      • sample_1__6, ITILSYHSSIGVRKDELVHGYILVYS, 0.1057, 0
      • sample_1__21, ITILSYHSSIGVRKDELVHGYILVYSAKRKASMGMLRAFLS, 0.0611, 0
      • sample_1__25, SYHSSIGVRKDELVHGYILVYSAKRKASMGMLRAFLS, 0.06139, 0
    • The column is same as input fasta file, with "__location" appended that provides the location of the Y residue in the protein sequence.
  • The framework predictions are saved in CSV file that can be downloaded.
    • The CSV file contains four columns: Sequence_name, Sequence, Probability, Class
    • Sequence_name column denotes the name of sequence
    • Sequence column represents the sequence used
    • "Probability" column provides probability of presence of Tyrosine Nitration site.
    • Class column translates the probability into class label, where 1 denotes positive Tyrosine Nitration site, and 0 denotes negative.

TRAINING THE HYBRID ENSEMBLE ARCHITECTURE FROM SCRATCH:

  • The NTpred framework can be utilized to perform experimentation in k-fold Cross Validation and Independent Test settings. To perform experimentation in both settings, training data should be provided in a standard Fasta format.
  • The Fasta record header should follow:
    • Sequence_Name|Class>|Label. For example: sample_1|0|training
    • Sequence_Name should be unique
    • Class should contain either 1 or 0, denoting the sequence as positive or negative site.
    • Label is a random placeholder value
  • Users can choose "Kfold" or "Standard" training mode.
    • "Kfold" training mode performs a K-fold evaluation of NTpred framework on the provided training data.
    • "Standard" training mode can be used to perform Independent test setting. A trained model is deployed using the user provided training data. The trained model can be used for Prediction by the user.